This is a starter RMarkdown template to accompany Data Visualization (Princeton University Press, 2019). You can use it to take notes, write your code, and produce a good-looking, reproducible document that records the work you have done. At the very top of the file is a section of metadata, or information about what the file is and what it does. The metadata is delimited by three dashes at the start and another three at the end. You should change the title, author, and date to the values that suit you. Keep the output line as it is for now, however. Each line in the metadata has a structure. First the key (“title”, “author”, etc), then a colon, and then the value associated with the key.
Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. A code chunk is a specially delimited section of the file. You can add one by moving the cursor to a blank line choosing Code > Insert Chunk from the RStudio menu. When you do, an empty chunk will appear:
Code chunks are delimited by three backticks (found to the left of the 1 key on US and UK keyboards) at the start and end. The opening backticks also have a pair of braces and the letter r, to indicate what language the chunk is written in. You write your code inside the code chunks. Write your notes and other material around them, as here.
To install the tidyverse, make sure you have an Internet connection. Then manually run the code in the chunk below. If you knit the document if will be skipped. We do this because you only need to install these packages once, not every time you run this file. Either knit the chunk using the little green “play” arrow to the right of the chunk area, or copy and paste the text into the console window.
## This code will not be evaluated automatically.
## (Notice the eval = FALSE declaration in the options section of the
## code chunk)
my_packages <- c("tidyverse", "broom", "coefplot", "cowplot",
"gapminder", "GGally", "ggrepel", "ggridges", "gridExtra",
"here", "interplot", "margins", "maps", "mapproj",
"mapdata", "MASS", "quantreg", "rlang", "scales",
"survey", "srvyr", "viridis", "viridisLite", "devtools")
install.packages(my_packages, repos = "http://cran.rstudio.com")
To begin we must load some libraries we will be using. If we do not load them, R will not be able to find the functions contained in these libraries. The tidyverse includes ggplot and other tools. We also load the socviz and gapminder libraries.
Notice that here, the braces at the start of the code chunk have some additional options set in them. There is the language, r, as before. This is required. Then there is the word setup, which is a label for your code chunk. Labels are useful to briefly say what the chunk does. Label names must be unique (no two chunks in the same document can have the same label) and cannot contain spaces. Then, after the comma, an option is set: include=FALSE. This tells R to run this code but not to include the output in the final document.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
gapminder
## # A tibble: 1,704 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## # … with 1,694 more rows
The remainder of this document contains the chapter headings for the book, and an empty code chunk in each section to get you started. Try knitting this document now by clicking the “Knit” button in the RStudio toolbar, or choosing File > Knit Document from the RStudio menu.
p <- ggplot(data = gapminder, mapping = aes(x=gdpPercap, y=lifeExp))
p + geom_point()
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point()
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method="lm")
## `geom_smooth()` using formula 'y ~ x'
…“an ill-advised linear fit”
In the plot, data is bunched up against the left side. The x-scale would probably look better if it were converted from a linear scale to a log scale, using the function scale_x_log10().
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method="gam") + scale_x_log10()
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
Let’s tidy up the axes.
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point() + geom_smooth(method="gam") + scale_x_log10(labels = scales::dollar)
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
Adding some colour
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(color="purple") + geom_smooth(method="loess") + scale_x_log10(labels = scales::dollar)
## `geom_smooth()` using formula 'y ~ x'
removing the Standard Error ribbon (se)
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(alpha=0.3) + geom_smooth(color="orange", se = FALSE, size=1, method="lm") + scale_x_log10(labels = scales::dollar)
## `geom_smooth()` using formula 'y ~ x'
fix the labels
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(alpha=0.3) + geom_smooth(method="gam") + scale_x_log10(labels = scales::dollar) + labs(x="GDP Per Capita", y="Life Expectancy in Years", title="Economic Growth and Life Expectancy", subtitle="Data points are country-years", caption="Source: Gapminder")
## `geom_smooth()` using formula 'y ~ s(x, bs = "cs")'
Add continent information
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color=continent))
p + geom_point(alpha=0.3) + geom_smooth(method="loess") + scale_x_log10(labels = scales::dollar) + labs(x="GDP Per Capita", y="Life Expectancy in Years", title="Economic Growth and Life Expectancy", subtitle="Data points are country-years", caption="Source: Gapminder")
## `geom_smooth()` using formula 'y ~ x'
Colouring the SE shading.
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp, color=continent, fill=continent))
p + geom_point(alpha=0.3) + geom_smooth(method="loess") + scale_x_log10(labels = scales::dollar) + labs(x="GDP Per Capita", y="Life Expectancy in Years", title="Economic Growth and Life Expectancy", subtitle="Data points are country-years", caption="Source: Gapminder")
## `geom_smooth()` using formula 'y ~ x'
Using one SE line
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(mapping=aes(color=continent)) + geom_smooth(method="loess") + scale_x_log10(labels = scales::dollar) + labs(x="GDP Per Capita", y="Life Expectancy in Years", title="Economic Growth and Life Expectancy", subtitle="Data points are country-years", caption="Source: Gapminder")
## `geom_smooth()` using formula 'y ~ x'
ggsave(filename="my_figure.png")
## Saving 8 x 5 in image
## `geom_smooth()` using formula 'y ~ x'
Mapping continuous variables to colour
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_point(mapping=aes(color=log(pop))) + geom_smooth(method="loess") + scale_x_log10(labels = scales::dollar) + labs(x="GDP Per Capita", y="Life Expectancy in Years", title="Economic Growth and Life Expectancy", subtitle="Data points are country-years", caption="Source: Gapminder")
## `geom_smooth()` using formula 'y ~ x'
Putting ‘smooth’ before ‘point’
p <- ggplot(data = gapminder, mapping = aes(x = gdpPercap, y = lifeExp))
p + geom_smooth(method="loess") + geom_point(mapping=aes(color=continent)) + scale_x_log10(labels = scales::dollar) + labs(x="GDP Per Capita", y="Life Expectancy in Years", title="Economic Growth and Life Expectancy", subtitle="Data points are country-years", caption="Source: Gapminder")
## `geom_smooth()` using formula 'y ~ x'
ggsave(filename="my_figure.png")
## Saving 8 x 5 in image
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data=gapminder, mapping=aes(x=year, y=gdpPercap))
p + geom_line(aes(group=country))
Using Facet to make small multiples
p <- ggplot(data=gapminder, mapping=aes(x = year, y = gdpPercap))
p + geom_line(aes(group=country)) + facet_wrap(~continent)
Arranging the facets
p <- ggplot(data = gapminder, mapping = aes(x = year, y = gdpPercap))
p + geom_line(color="gray70", aes(group = country)) + geom_smooth(size = 1.1, method = "loess", se = FALSE) + scale_y_log10(labels=scales::dollar) + facet_wrap(~continent, ncol = 5) + labs(x = "Year", y = "GDP per capita", title = "GDP per capita on Five Continents")
## `geom_smooth()` using formula 'y ~ x'
p <- ggplot(data = gss_sm, mapping=aes(x = age, y=childs))
p + geom_point(alpha = 0.2) + geom_smooth() + facet_grid( sex ~ race)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
p <- ggplot(data = gss_sm, mapping=aes(x = bigregion))
p + geom_bar()
p <- ggplot(data = gss_sm, mapping=aes(x = bigregion))
p + geom_bar(mapping = aes(y = ..prop.., group = 1))
p <- ggplot(data = gss_sm, mapping = aes(x = religion, fill = religion))
p + geom_bar()
p <- ggplot(data = gss_sm, mapping = aes(x = religion, fill = religion))
p + geom_bar() + guides(fill=FALSE)
p <- ggplot(data=gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar()
p <- ggplot(data=gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position="fill")
p <- ggplot(data=gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position="dodge")
p <- ggplot(data=gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position="dodge", mapping=aes(y=..prop..))
p <- ggplot(data=gss_sm, mapping = aes(x = bigregion, fill = religion))
p + geom_bar(position="dodge", mapping=aes(y=..prop.., group=religion))
p <- ggplot(data=gss_sm, mapping = aes(x = religion))
p + geom_bar(position="dodge", mapping=aes(y=..prop.., group=bigregion)) + facet_wrap(~bigregion, ncol=2)
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p <- ggplot(data = midwest, mapping = aes(x = area))
p + geom_histogram(bins=10)
oh_wi <- c("OH", "WI")
p <- ggplot(data = subset(midwest, subset = state %in% oh_wi), mapping = aes(x=percollege, fill=state))
p + geom_histogram(alpha = 0.4, bins = 20)
p <- ggplot(data = midwest, mapping = aes(x=area))
p + geom_density()
p <- ggplot(data = midwest, mapping = aes(x=area, fill=state, color=state))
p + geom_density(alpha = 0.3)
p <- ggplot(data=subset(midwest, subset = state %in% oh_wi), mapping = aes(x=area, fill=state, color=state))
p + geom_density(alpha = 0.3, mapping = (aes(y = ..scaled..)))
p <- ggplot(data=titanic, mapping = aes(x = fate, y = percent, fill = sex))
p + geom_bar(position = "dodge", stat = "identity") + theme(legend.position = "top")
p <- ggplot(data = oecd_sum, mapping = aes( x= year, y = diff, fill = hi_lo))
p + geom_col() + guides(fill = FALSE) + labs(x = NULL, y = "Difference in Years", title = "The US Life Expectancy Gap", subtitle = "Difference between US and OECD average life expectancies, 1960-2015", caption = "Data: OECD. After a chart by Christopher Ingraham, Washington Post, December 17th, 2017.")
## Warning: Removed 1 rows containing missing values (position_stack).
data = read.csv("Keynote.csv")
p <- ggplot(data = data, mapping = aes(x = year, y = amount))
p + geom_line(aes(group = country)) + facet_wrap(~ country, ncol=4)
rel_by_region <- gss_sm %>% group_by(bigregion, religion) %>% summarize(N = n()) %>% mutate(freq = N / sum(N), pct = round((freq * 100), 0))
## Warning: Factor `religion` contains implicit NA, consider using
## `forcats::fct_explicit_na`
rel_by_region
## # A tibble: 24 x 5
## # Groups: bigregion [4]
## bigregion religion N freq pct
## <fct> <fct> <int> <dbl> <dbl>
## 1 Northeast Protestant 158 0.324 32
## 2 Northeast Catholic 162 0.332 33
## 3 Northeast Jewish 27 0.0553 6
## 4 Northeast None 112 0.230 23
## 5 Northeast Other 28 0.0574 6
## 6 Northeast <NA> 1 0.00205 0
## 7 Midwest Protestant 325 0.468 47
## 8 Midwest Catholic 172 0.247 25
## 9 Midwest Jewish 3 0.00432 0
## 10 Midwest None 157 0.226 23
## # … with 14 more rows
p <- ggplot(rel_by_region, aes(x = bigregion, y = pct, fill=religion))
p + geom_col(position = "dodge2") + labs(x= "Region", y = "Percent", fill = "Religion") + theme(legend.position = "top")
p <- ggplot(rel_by_region, aes(x = religion, y = pct, fill=religion))
p + theme(panel.background = element_rect(fill = 'white', color="black")) + geom_col(position = "dodge2") + labs(x= NULL, y = "Percent", fill = "Religion") + guides(fill = FALSE) + coord_flip() + facet_grid(~ bigregion)
organdata %>% select(1:6) %>% sample_n(size = 10)
## # A tibble: 10 x 6
## country year donors pop pop_dens gdp
## <chr> <date> <dbl> <int> <dbl> <int>
## 1 Spain 1996-01-01 26.8 39279 7.76 16416
## 2 Ireland 1994-01-01 20.3 3590 5.11 15990
## 3 United States 1993-01-01 18.7 259919 2.70 25327
## 4 Finland NA NA 4986 1.47 18025
## 5 Denmark 1998-01-01 11 5304 12.3 25537
## 6 Sweden 1991-01-01 16.4 8617 1.92 19000
## 7 Germany 1991-01-01 13.3 80014 22.4 17511
## 8 France 1993-01-01 17.1 57467 10.4 19763
## 9 Canada 2001-01-01 13.5 31111 0.312 29235
## 10 Ireland NA NA 3514 5.00 12917
p <- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_point()
## Warning: Removed 34 rows containing missing values (geom_point).
p <- ggplot(data = organdata, mapping = aes(x = year, y = donors))
p + geom_line(aes(group = country)) + facet_wrap(~country, ncol = 4)
## Warning: Removed 34 row(s) containing missing values (geom_path).
p <- ggplot(data = organdata, mapping=aes(x=country, y = donors))
p + geom_boxplot()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping=aes(x=country, y = donors))
p + geom_boxplot() + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping=aes(x=reorder(country, donors, na.rm = TRUE), y = donors))
p + geom_boxplot() + labs(x=NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping=aes(x=reorder(country, donors, na.rm = TRUE), y = donors))
p + geom_boxplot() + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).
p <- ggplot(data = organdata, mapping=aes(x=reorder(country, donors, na.rm = TRUE), y = donors))
p + geom_violin() + labs(x=NULL) + coord_flip()
## Warning: Removed 34 rows containing non-finite values (stat_ydensity).
p <- ggplot(data = organdata, mapping=aes(x=reorder(country, donors, na.rm = TRUE), y = donors, fill=world))
p + geom_boxplot() + labs(x=NULL) + coord_flip() + theme(legend.position="top")
## Warning: Removed 34 rows containing non-finite values (stat_boxplot).